{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAPoU8Sm5E6e"
      },
      "source": [
        "# Serving Open-Source LLMs on Vertex AI with LiteLLM and OpenAI-Compatible APIs\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/tree/main/open-models/use-cases/model_garden_litellm_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fopen-models%2Fuse-cases%2Fmodel_garden_litellm_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/open-models/use-cases/model_garden_litellm_inference.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/tree/main/open-models/use-cases/model_garden_litellm_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/model_garden_litellm_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/model_garden_litellm_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/model_garden_litellm_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/model_garden_litellm_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/model_garden_litellm_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "84f0f73a0f76"
      },
      "source": [
        "| Author |\n",
        "| --- |\n",
        "| [Eliza Huang](https://github.com/lizzij) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvgnzT1CKxrO"
      },
      "source": [
        "## Overview\n",
        "\n",
        "This notebook demonstrates how to:\n",
        "- Deploy an open-source LLM (e.g. DeepSeek, Qwen, LLaMA, Gemma) via [Vertex AI Model Garden](https://cloud.google.com/vertex-ai/docs/generative-ai/model-garden).\n",
        "- Serve the model with a public Vertex AI endpoint.\n",
        "- Connect it to [LiteLLM](https://docs.litellm.ai/) using an OpenAI-compatible schema for chat completion and function calling.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "61RBz8LLbxCR"
      },
      "source": [
        "## Get started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "No17Cw5hgx12"
      },
      "source": [
        "### Install Vertex AI SDK and other required packages\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tFy3H3aPgx12"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --force-reinstall --quiet 'google-cloud-aiplatform>=1.93.1' 'litellm' 'openai' 'google-auth' 'requests'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dmWOrTJ3gx13"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NyKGtVQjgx13"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DF4l8DTdWgPY"
      },
      "source": [
        "### Set Google Cloud project information\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Nqwi-5ufWp_B"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "import vertexai\n",
        "\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = os.environ.get(\"GOOGLE_CLOUD_PROJECT\")\n",
        "    if PROJECT_ID:\n",
        "        PROJECT_ID = str(PROJECT_ID)\n",
        "\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "vertexai.init(project=PROJECT_ID, location=LOCATION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "haaUJ5VaWajl"
      },
      "source": [
        "### Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kCp1oCOyWfZe"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import re\n",
        "\n",
        "from vertexai import model_garden"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fvVKEpROoXW5"
      },
      "source": [
        "### Define helpers"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9AoCHkCNoaLG"
      },
      "outputs": [],
      "source": [
        "def print_models(data_list: list[str], items_per_line: int = 2) -> None:\n",
        "    \"\"\"Prints the list with a specified number of items per line with index and emojis,\n",
        "    and includes the total count.\"\"\"\n",
        "    print(\"🌟--- Models available ---🌟\")\n",
        "    print(\"\\n\")\n",
        "    print(f\"🔢 Total models: {len(data_list)} 🔢\\n\")  # Print the count here\n",
        "\n",
        "    for i, item in enumerate(data_list):\n",
        "        print(f\"✨ {item} \", end=\"\")\n",
        "        if (i + 1) % items_per_line == 0:\n",
        "            print()\n",
        "        else:\n",
        "            print(\" --- \", end=\"\")\n",
        "\n",
        "    if len(data_list) % items_per_line != 0:\n",
        "        print()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dJmOAajdW7qQ"
      },
      "source": [
        "## Find the models that you can deploy\n",
        "\n",
        "In Vertex AI Model Garden, you can discover and deploy a wide range of open-source models.\n",
        "\n",
        "Many of these models are directly supported in Vertex AI Model Garden with some pre-configured for optimized deployment on Vertex AI. When the open model is not available in Vertex AI Model Garden or you want to deploy your model from HF hub, you can leverage the Hugging Face gallery which gives you access to more that 1M models.\n",
        "\n",
        "With the Vertex AI Model Garden SDK, you can browse and deploy supported models, including those from the Hugging Face gallery. You can also filter by model name or type to find the best fit for your use case.\n",
        "\n",
        "Let's explore which Llama models are available in Vertex AI Model Garden."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2l5LsbqTevUr"
      },
      "outputs": [],
      "source": [
        "model_garden_models = model_garden.list_deployable_models(\n",
        "    model_filter=\"Llama\", list_hf_models=False\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "z5Fa1spZr2YZ"
      },
      "outputs": [],
      "source": [
        "print_models(model_garden_models, items_per_line=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EWRuFtX4iXW4"
      },
      "source": [
        "To include Llama models that are available via Hugging Face Gallery, you can enable `list_hf_models` flag."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e9zjeZ-rrPPk"
      },
      "outputs": [],
      "source": [
        "deployable_models = model_garden.list_deployable_models(\n",
        "    model_filter=\"llama\", list_hf_models=True\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "On3fzp-erTse"
      },
      "outputs": [],
      "source": [
        "print_models(deployable_models)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3BOPydegEkP2"
      },
      "source": [
        "## Deploy your 1st Model Garden model\n",
        "\n",
        "In this section, you'll deploy an open-source model from Vertex AI Model Garden to a public endpoint. You can use the Cloud Console or Vertex AI SDK for deployment.\n",
        "\n",
        "For more advanced workflows (e.g., custom hardware, model tuning, or shared endpoints), refer to the [official SDK notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4qGzyKIPrkFE"
      },
      "outputs": [],
      "source": [
        "model_id = \"meta/llama3_1@llama-3.1-8b-instruct\"\n",
        "\n",
        "model = model_garden.OpenModel(model_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vSFfpZ7Nxlkt"
      },
      "source": [
        "### Check the deployment configuration"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cA_yIRK-t_2T"
      },
      "source": [
        "After you initiate the model, use `list_deploy_options()` method to discover the verified deployment configurations supported by a specific model.\n",
        "\n",
        "This is important to verify if you have enough resources to deploy the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PaOvRoFZt9T1"
      },
      "outputs": [],
      "source": [
        "deploy_options = model.list_deploy_options(concise=True)\n",
        "print(deploy_options)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C3OKT16hugDY"
      },
      "source": [
        "### Deploy the model\n",
        "\n",
        "Now that you know how the model will be deployed, let's use the `deploy()` method to serve the selected open model to a Vertex AI Endpoint. Depending on the model, the deployment would require some minutes.\n",
        "\n",
        "> **Note**: If the model has an End User License Agreement (EULA), you can accept it using `accept_eula` flag.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0YqJ9pKi0cjz"
      },
      "outputs": [],
      "source": [
        "endpoint = model.deploy(\n",
        "    accept_eula=True,\n",
        "    serving_container_image_uri=\"us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas\",\n",
        "    machine_type=\"g2-standard-12\",\n",
        "    accelerator_type=\"NVIDIA_L4\",\n",
        "    accelerator_count=1,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sai-h7DryWKi"
      },
      "source": [
        "## LiteLLM Inference"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Set up LiteLLM\n",
        "\n",
        "Configure [LiteLLM](https://docs.litellm.ai/) to route OpenAI-compatible API calls (e.g. chat, function calling) to your Vertex AI endpoint. Use the `vertex_ai/openai/{endpoint_id}` format to enable OpenAI-style behavior. Other formats like `vertex_ai/gemini/{endpoint}` are also supported — see the full list [here](https://litellm-api.up.railway.app/)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def _setup_model_garden_endpoint(endpoint_resource_name: str) -> str:\n",
        "    \"\"\"Sets up the model garden endpoint by extracting project, location, and endpoint ID.\n",
        "\n",
        "    Args:\n",
        "        endpoint_resource_name: The full resource name of the endpoint,\n",
        "                                 e.g., \"projects/{project}/locations/{location}/endpoints/{endpoint}\".\n",
        "\n",
        "    Returns:\n",
        "        A formatted string for the endpoint, e.g., \"vertex_ai/openai/{endpoint_id}\".\n",
        "\n",
        "    Raises:\n",
        "        ValueError: If the endpoint_resource_name does not match the expected format.\n",
        "    \"\"\"\n",
        "    _MODEL_GARDEN_ENDPOINT_REGEX = r\"^projects/([^/]+)/locations/([^/]+)/endpoints/([^/]+)$\"\n",
        "\n",
        "    match = re.fullmatch(_MODEL_GARDEN_ENDPOINT_REGEX, endpoint_resource_name)\n",
        "    if not match:\n",
        "        raise ValueError(\n",
        "            f\"Invalid model garden endpoint: '{endpoint_resource_name}'. \"\n",
        "            \"Please use the format 'projects/{{project}}/locations/{{location}}/endpoints/{{endpoint}}'.\"\n",
        "        )\n",
        "\n",
        "    project_id, location_id, endpoint_id = match.groups()\n",
        "\n",
        "    os.environ.setdefault(\"GOOGLE_GENAI_USE_VERTEXAI\", \"True\")\n",
        "    os.environ[\"VERTEXAI_PROJECT\"] = project_id\n",
        "    os.environ[\"VERTEXAI_LOCATION\"] = location_id\n",
        "    os.environ[\"LITELLM_LOG\"] = \"DEBUG\" # Consider if this is always desired in production\n",
        "\n",
        "    return f\"vertex_ai/openai/{endpoint_id}\"\n",
        "\n",
        "deployed_model = _setup_model_garden_endpoint(endpoint.resource_name)\n",
        "print(f\"The current model is: {deployed_model}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Chat Completion via LiteLLM\n",
        "\n",
        "This section demonstrates how to send role-based chat messages to your Vertex AI–hosted model using LiteLLM’s `completion` API. For examples and formatting guidance, refer to the [LiteLLM input documentation](https://docs.litellm.ai/completion/input)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from litellm import completion\n",
        "\n",
        "response = completion(\n",
        "    model=deployed_model,\n",
        "    messages=[\n",
        "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
        "        {\"role\": \"user\", \"content\": \"What's the capital of Japan?\"}\n",
        "    ]\n",
        ")\n",
        "\n",
        "print(response[\"choices\"][0][\"message\"][\"content\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Function Calling via LiteLLM\n",
        "\n",
        "Here we use OpenAI-style function calling with LiteLLM and a Vertex AI–hosted model. You’ll define a function schema, and the model will return structured arguments suitable for downstream execution. For reference, see [`test_parallel_function_call`](https://docs.litellm.ai/docs/completion/function_call#full-code---parallel-function-calling-with-gpt-35-turbo-1106) in the LiteLLM documentation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import google.auth\n",
        "\n",
        "creds, project = google.auth.default()\n",
        "auth_req = google.auth.transport.requests.Request()\n",
        "creds.refresh(auth_req)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import litellm\n",
        "import json\n",
        "\n",
        "# set openai api key\n",
        "import os\n",
        "os.environ['OPENAI_API_KEY'] = creds.token\n",
        "\n",
        "# Example dummy function hard coded to return the same weather\n",
        "# In production, this could be your backend API or an external API\n",
        "def get_current_weather(location, unit=\"fahrenheit\"):\n",
        "    \"\"\"Get the current weather in a given location\"\"\"\n",
        "    if \"tokyo\" in location.lower():\n",
        "        return json.dumps({\"location\": \"Tokyo\", \"temperature\": \"10\", \"unit\": \"celsius\"})\n",
        "    elif \"san francisco\" in location.lower():\n",
        "        return json.dumps({\"location\": \"San Francisco\", \"temperature\": \"72\", \"unit\": \"fahrenheit\"})\n",
        "    elif \"paris\" in location.lower():\n",
        "        return json.dumps({\"location\": \"Paris\", \"temperature\": \"22\", \"unit\": \"celsius\"})\n",
        "    else:\n",
        "        return json.dumps({\"location\": location, \"temperature\": \"unknown\"})\n",
        "\n",
        "\n",
        "def test_parallel_function_call():\n",
        "    try:\n",
        "        # Step 1: send the conversation and available functions to the model\n",
        "        messages = [{\"role\": \"user\", \"content\": \"What's the weather like in San Francisco, Tokyo, and Paris?\"}]\n",
        "        tools = [\n",
        "            {\n",
        "                \"type\": \"function\",\n",
        "                \"function\": {\n",
        "                    \"name\": \"get_current_weather\",\n",
        "                    \"description\": \"Get the current weather in a given location\",\n",
        "                    \"parameters\": {\n",
        "                        \"type\": \"object\",\n",
        "                        \"properties\": {\n",
        "                            \"location\": {\n",
        "                                \"type\": \"string\",\n",
        "                                \"description\": \"The city and state, e.g. San Francisco, CA\",\n",
        "                            },\n",
        "                            \"unit\": {\"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"]},\n",
        "                        },\n",
        "                        \"required\": [\"location\"],\n",
        "                    },\n",
        "                },\n",
        "            }\n",
        "        ]\n",
        "        response = litellm.completion(\n",
        "            model=deployed_model,\n",
        "            messages=messages,\n",
        "            tools=tools,\n",
        "            tool_choice=\"auto\",  # auto is default, but we'll be explicit\n",
        "        )\n",
        "        print(\"\\nFirst LLM Response:\\n\", response)\n",
        "        response_message = response.choices[0].message\n",
        "        tool_calls = response_message.tool_calls\n",
        "\n",
        "        print(\"\\nLength of tool calls\", len(tool_calls))\n",
        "\n",
        "        # Step 2: check if the model wanted to call a function\n",
        "        if tool_calls:\n",
        "            # Step 3: call the function\n",
        "            # Note: the JSON response may not always be valid; be sure to handle errors\n",
        "            available_functions = {\n",
        "                \"get_current_weather\": get_current_weather,\n",
        "            }  # only one function in this example, but you can have multiple\n",
        "            messages.append(response_message)  # extend conversation with assistant's reply\n",
        "\n",
        "            # Step 4: send the info for each function call and function response to the model\n",
        "            for tool_call in tool_calls:\n",
        "                function_name = tool_call.function.name\n",
        "                function_to_call = available_functions[function_name]\n",
        "                function_args = json.loads(tool_call.function.arguments)\n",
        "                function_response = function_to_call(\n",
        "                    location=function_args.get(\"location\"),\n",
        "                    unit=function_args.get(\"unit\"),\n",
        "                )\n",
        "                messages.append(\n",
        "                    {\n",
        "                        \"tool_call_id\": tool_call.id,\n",
        "                        \"role\": \"tool\",\n",
        "                        \"name\": function_name,\n",
        "                        \"content\": function_response,\n",
        "                    }\n",
        "                )  # extend conversation with function response\n",
        "            second_response = litellm.completion(\n",
        "                model=deployed_model,\n",
        "                messages=messages,\n",
        "            )  # get a new response from the model where it can see the function response\n",
        "            print(\"\\nSecond LLM response:\\n\", second_response)\n",
        "            return second_response\n",
        "    except Exception as e:\n",
        "      print(f\"Error occurred: {e}\")\n",
        "\n",
        "test_parallel_function_call()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2a4e033321ad"
      },
      "source": [
        "## Cleaning up"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "N5UrKKMifeGi"
      },
      "outputs": [],
      "source": [
        "delete_endpoints = True\n",
        "\n",
        "if delete_endpoints:\n",
        "    endpoint.delete(force=True)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "get_started_with_model_garden_sdk.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
